Efficient Evaluation of Numerical Preferences: Top k Queries, Skylines and Multi-objective Retrieval
نویسنده
چکیده
Query processing in databases and information systems has developed beyond mere SQLstyle exact matching of attribute values. Scoring database objects according to numerical user preferences and retrieving only the top k matches or Pareto-optimal result sets (skyline queries) are already common for a variety of applications. Recently a lot of database literature has focussed on how to efficiently evaluate queries based on numerical preferences. Specialized algorithms using either top k retrieval (assuming a single compensation function defined over all query predicates, i.e. a global utility function) or computing skylines (assuming all query predicates as pairwise incomparable) have been shown to be capable of avoiding naïve linear database scans by pruning large numbers of database objects and thus vastly improve scalability. However, both paradigms are only two extreme cases of exploring viable compromises for each user‘s objectives, which may or may not be comparable. To find the correct result set for arbitrary cases of multi-objective query processing in databases a novel algorithm for computing sets of objects that are non-dominated with respect to a set of monotonic objective functions representing a user's notion of utility, has recently been presented. Naturally containing top k and skyline retrieval paradigms as special cases, this algorithm maintains scalability also for all cases in between. To be more precise, in both special cases the multi-objective retrieval algorithm will behave exactly like the most efficient known evaluation algorithms for top k and skyline queries respectively. This algorithm has also been proved to be correct and instance-optimal in terms of necessary object accesses. Moreover, it improves the psychological response behaviour by progressively producing result objects as quickly as possible, while the algorithm is still running, so user can deal with result objects at the earliest point in time. Our tutorial will discuss all state of the art algorithms for top k retrieval, skyline queries and multi-objective retrieval and point to open problems, future extensions of the paradigm and research in numerical preferences. (Dagstuhl Seminar Series Nr 04271. 27.06.-02.07.04 G. Bosi, R. Brafman, J. Chomicki, W. Kießling: Preferences: Specification, Inference, Applications)
منابع مشابه
Threshold Phenomena in k-Dominant Skylines of Random Samples
Skylines emerged as a useful notion in database queries for selecting representative groups in multivariate data samples for further decision making, multi-objective optimization or data processing, and the k-dominant skylines were naturally introduced to resolve the abundance of skylines when the dimensionality grows or when the coordinates are negatively correlated. We prove in this paper tha...
متن کاملApproaching the Efficient Frontier: Cooperative Database Retrieval Using High-Dimensional Skylines
Cooperative database retrieval is a challenging problem: top k retrieval delivers manageable results only when a suitable compensation function (e.g. a weighted mean) is explicitly given. On the other hand skyline queries offer intuitive querying to users, but result set sizes grow exponentially and hence can easily exceed manageable levels. We show how to combine the advantages of skyline quer...
متن کاملEfficient Skyline Queries under Weak Pareto Dominance
Skylines with partial order preference semantics often result in huge answer sets and what is worse, they cannot be computed efficiently. In this paper we will explore the evaluation of so-called restricted skyline queries with partial order preferences under the paradigm of weak Pareto dominance. Weak Pareto dominance removes all objects from skylines, which are dominated by other objects in s...
متن کاملQuery Ordering Based Top-k Algorithms for Qualitatively Specified Preferences
Preference modelling and management has attracted considerable attention in the areas of Databases, Knowledge Bases and Information Retrieval Systems in recent years. This interest stems from the fact that a rapidly growing class of untrained lay users confront vast data collections, usually through the Internet, typically lacking a clear view of either content or structure, moreover, not even ...
متن کاملEstimation of Potential Product Using Reverse Top-k Queries
Atpresent, most of the applications return to the user a limited set of ranked results based on the individual user’s preferences, which are commonly validated through top-k queries. From the perspective of a manufacturer, it is imperative that the products appear in the highest ranked positions for many different user preferences, otherwise the product is not visible to the potential customers...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004